This tutorial provides an example of conducting cell-type deconvolution on oral squamous cell carcinoma (OSCC) spatial transcriptomics data from Arora and Cao et al, 2023 using the ZI-HGT + CARD. The ZI-HGT, or zero-inflated hierarchical generalized transformation model, is a technique to noisily transform spatial transcriptomics data by reducing zero-inflation, ties, and skewness to achieve approximate normality, and it enables uncertainty quantification. The ZI-HGT is combined here with CARD (Ma et al, 2022) to perform cell-type deconvolution.

Data

This tutorial uses the OSCC spatial transcriptomics data from Arora and Cao et al, 2023, for which the processed Seurat objects can be downloaded here (raw and SpaceRanger processed data are deposited in the National Center of Biotechnology Information’s Gene Expression Omnibus (GEO) database under accession code GSE208253). The single-cell deconvolution reference from Puram et al, 2017 may be found in the GEO database under accession code GSE103322.

We provide small subsets of these publicly available datasets used for demonstration purposes only.

Sources:

Please cite the original studies if using these data in any published work.

Spatial Transcriptomics and Spatial Locations

The spatial transcriptomics data should be in a matrix or sparseMatrix format, where rows are genes and columns are named locations. The spatial locations should be in a dataframe format with two columns, x and y, representing the spatial coordinates of each location. The row names of the spatial locations dataframe should match the column names of the spatial transcriptomics matrix.

spatial_count <- readRDS("Example_Data/example_spatial_counts.rds")
spatial_location <- readRDS("Example_Data/example_spatial_locations.rds")

as.matrix(spatial_count)[1:5,1:5]
##            CACTTCGCCACAGGCT-1 TACGTGCACTATGCTG-1 GCTAAACCTGAGGTGA-1
## AL627309.1                  0                  0                  0
## LINC01409                   0                  0                  0
## LINC01128                   0                  0                  0
## FAM41C                      0                  0                  0
## SAMD11                      0                  0                  0
##            CGAGCACTTCAAGTTT-1 TGGGAAATGCCTTTCC-1
## AL627309.1                  0                  0
## LINC01409                   0                  0
## LINC01128                   0                  0
## FAM41C                      0                  0
## SAMD11                      0                  0
head(spatial_location, 5)
##                     x  y
## CACTTCGCCACAGGCT-1 21 31
## TACGTGCACTATGCTG-1 21 33
## GCTAAACCTGAGGTGA-1 23 23
## CGAGCACTTCAAGTTT-1 23 25
## TGGGAAATGCCTTTCC-1 23 27

Single-Cell Reference and Metadata

The single-cell RNAseq reference gene expression data should be in a matrix or sparseMatrix format, where rows are genes and columns are named cells. The metadata should be in a dataframe format that includes a sample column (not required to be named sample) and a cell_type column (not required to be named cell_type), and the rows of the metadata should match the column names of the scRNAseq gene expression matrix.

sc_count <- readRDS("Example_Data/example_sc_counts.rds")
sc_meta <- readRDS("Example_Data/example_sc_metadata.rds")

as.matrix(sc_count)[1:5,1:5]
##          HN28_P15_D06_S330_comb HN28_P6_G05_S173_comb HN26_P14_D11_S239_comb
## C9orf152                 0.0000                0.0000                0.42761
## RPS11                    6.0037                7.3006                7.28850
## ELMO2                    0.0000                0.0000                0.00000
## CREB3L1                  0.0000                0.0000                0.00000
## PNMA1                    5.1474                5.3329                2.83370
##          HN26_P14_H05_S281_comb HN26_P25_H09_S189_comb
## C9orf152                 0.0000                0.00000
## RPS11                    0.0000                7.47420
## ELMO2                    5.2465                0.50487
## CREB3L1                  0.0000                0.00000
## PNMA1                    5.7507                0.19661
head(sc_meta, 5)
##                        sample     cell_type
## HN28_P15_D06_S330_comb   HN28 myofibroblast
## HN28_P6_G05_S173_comb    HN28 myofibroblast
## HN26_P14_D11_S239_comb   HN26   cancer cell
## HN26_P14_H05_S281_comb   HN26 myofibroblast
## HN26_P25_H09_S189_comb   HN26   cancer cell

Applying the ZI-HGT to Cell-Type Deconvolution with CARD

The ZI-HGT noisily transforms spatial transcriptomics data by reducing zero-inflation, ties, and skewness to achieve approximate normality through a type of zero-inflated Poisson. \(n\) posterior samples of the transformation are generated, and cell-type deconvolution with CARD is applied to each to generate a posterior distribution of the cell-type proportions. We suggest taking the mean of the posterior distribution as the cell-type proportions point estimate, and we provide 95% pointwise Bayesian credible intervals of each cell-type proportion at each location.

To run the ZI-HGT and CARD, source the functions in the HGTfunctions3.R script (available in this directory) and follow the example below.

# Source the needed functions
source("./HGTfunctions3.R")

# Run the ZI-HGT and CARD
 ZI_HGT_CARD_obj <- HGTplusCARD(
      sc_count = sc_count,
      sc_meta = sc_meta,
      spatial_count = spatial_count,
      spatial_location = spatial_location,
      ct.varname = "cell_type",
      ct.select = unique(sc_meta$cell_type),
      sample.varname = "sample",
      minCountGene = 100,
      minCountSpot = 5,
      alpha0 = 2,
      alpha1 = 0.5,
      n = 5,
      seed = 123456,
      doWAIC = FALSE)

The arguments to the HGTplusCARD function are as follows:

Results

The output of the HGTplusCARD function is a list containing seven elements:

We then recommend visualizing the results with HGT.CARD.visualize.prop and HGT.CARD.two.tone, which wrap CARD’s visualization functions to generate plots of the cell-type proportions and a binary plot showing proportions greater than a cutoff, respectively. In the example below, we use a cutoff of 0.1, which, assuming approximately 10 cells per location, indicates at least one cell of that type at the locationn.

ct_props_plot <- HGT.CARD.visualize.prop(
                  proportion = ZI_HGT_CARD_obj$cell_type_proportion_matrices$mean_ct_prop,
                  spatial_location = spatial_location,
                  ct.visualize = unique(sc_meta$cell_type),
                  colors = c("mediumpurple4","white","green4"),
                  NumCols = 7,
                  pointSize = 7.0)
print(ct_props_plot)

ct_props_two_tone_plot <- HGT.CARD.two.tone(
                            proportion = ZI_HGT_CARD_obj$cell_type_proportion_matrices$mean_ct_prop,
                            spatial_location = spatial_location,
                            ct.visualize = unique(sc_meta$cell_type),
                            colors = c("grey70","green4"),
                            NumCols = 7,
                            pointSize = 7.0,
                            cutoff = 0.1)                             
print(ct_props_two_tone_plot)

R Version and Dependencies

All analysis in this tutorial was performed in R version 4.2.0 and requires the following packages:

  • ALRA_0.0.0.9000
  • Biobase_2.58.0
  • BiocGenerics_0.44.0
  • CARD_1.0
  • EpiDISH_2.14.1
  • GenomeInfoDb_1.34.9
  • GenomicRanges_1.50.2
  • ggplot2_3.5.2
  • IRanges_2.32.0
  • LaplacesDemon_16.1.6
  • LIMMA_3.54.2
  • loo_2.6.0
  • MatrixGenerics_1.10.0
  • matrixStats_1.1.0
  • MuSiC_1.0.0
  • nnls_1.5
  • quadprog_1.5-8
  • rdist_0.0.5
  • reshape2_1.4.4
  • S4Vectors_0.36.2
  • SummarizedExperiment_1.28.0
  • TOAST_1.12.0